Skip to content

Conversation

@vjanfaza
Copy link
Contributor

In these changes, instead of passing CCL lists during model loading, I passed a flag called ccl_enabled to specify whether CCL feature is enabled or not and moved passing CCL lists to compilation process.

@quic-mamta
Copy link
Contributor

@vjanfaza , Can you please resolve the conflicts on the PR and run lint/format checks?

@vjanfaza
Copy link
Contributor Author

vjanfaza commented Nov 20, 2025

@vjanfaza , Can you please resolve the conflicts on the PR and run lint/format checks?

I resolved the conflicts and pushed the changes.

comp_ctx_lengths_prefill = [256, 512, ctx_len]
comp_ctx_lengths_decode = [256, 512, ctx_len]
# In moe models when compiling with prefill_seq_len=1 and non-continuous-batching mode, prefill and decode will share the same ccl specializations.
comp_ctx_lengths_prefill = [256, 512, ctx_len] # None #
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; please remove the #None # at the end of this line from other places/files as well.


model_name = "Qwen/Qwen3-30B-A3B-Instruct-2507"
"""
# For CB inference, set continuous_batching to True and add full_batch_size,mxfp6,mint8 argument in compile function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, should be mxint8 not mint8

comp_ctx_lengths_prefill=comp_ctx_lengths_prefill,
comp_ctx_lengths_decode=comp_ctx_lengths_decode,
)
# mos=1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this line.

processor=processor,
images=image_urls,
generation_len=100,
device_ids=[28, 29, 30, 31],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make these as [0,1,2,3]

inputs["pixel_values"] = inputs["pixel_values"].to(torch.float32)
streamer = TextStreamer(tokenizer)
output = qeff_model.generate(inputs=inputs, device_ids=[0, 1, 2, 3], generation_len=100)
output = qeff_model.generate(inputs=inputs, device_ids=[8, 9, 10, 11], generation_len=100)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be kept as original.

@vjanfaza vjanfaza force-pushed the CCL-main branch 2 times, most recently from 4461b41 to b8dd26c Compare December 4, 2025 00:54
…ring compilation process

Signed-off-by: Vahid Janfaza <[email protected]>
…ring compilation process

Signed-off-by: Vahid Janfaza <[email protected]>
…ring compilation process

Signed-off-by: Vahid Janfaza <[email protected]>
…ring compilation process

Signed-off-by: Vahid Janfaza <[email protected]>
…ring compilation process

Signed-off-by: Vahid Janfaza <[email protected]>
…ring compilation process

Signed-off-by: Vahid Janfaza <[email protected]>
…ring compilation process

Signed-off-by: Vahid Janfaza <[email protected]>
…ring compilation process

Signed-off-by: Vahid Janfaza <[email protected]>
…ring compilation process

Signed-off-by: Vahid Janfaza <[email protected]>
…ring compilation process

Signed-off-by: Vahid Janfaza <[email protected]>
…ring compilation process

Signed-off-by: Vahid Janfaza <[email protected]>
@quic-hemagnih
Copy link
Contributor

…ring compilation process

Signed-off-by: Vahid Janfaza <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants